Social media scraping

Netvizz is a Facebook application, with a server provided by Digital Methods Initiative, at University of Amsterdam (makers of TCAT). The paper Studying Facebook via Data Extraction: The Netvizz Application (WebSci'13) by the developer Bernhart Rieder describes the fundamentals, and theoretical and methodological background.

The tool provides several modules for data extraction. We will start with the page data module.

Description of workshop

In this workshop we will explore Facebook data in a different way than through the “normal” Facebook interface. This will happen through two stages:

  • Collect data from a Facebook Page through the Facebook API with the tool Netvizz.
  • Visualize & explore the data from a Facebook Page with Gephi.

The purpose is to get accustomed with Netvizz, and to get an idea of the questions we can ask through this tools.

Learning Outcomes

  • Get accustomed with using Netvizz.
  • Get a sense of how Facebook constructs the way we view data and normalize certain ways of looking at data.

Step-by-step guide for collecting and exploring Facebook Data

Find a Facebook page that is relevant for your project or that you find interesting. Copy the URL of the Facebook page (this URL will be used in step 3).

Use Netvizz to collect data from a Facebook Page:

  1. Login to your personal Facebook account and go to the link: https://apps.facebook.com/netvizz/. From this link you install the app by allowing it to access public information from your profile. By clicking on the app you should see the same screen as below.
  2. Click on the “page data” link marked by the red square below.
  3. In order to collect the data from your Facebook Page you need the “page id”. Click on the link “find page ids here”. If the link doesn’t work you can use this link instead: http://findmyfbid.com/ Type in the full URL of the Facebook page you want to explore and receive a “numeric ID”.
  4. Enter the “numeric ID” in the textbox next to “page id:” see screenshot below. Choose last 50 posts. Choose full data. Click on the button “posts by page and users”.
  5. Wait for Netvizz to collect the data from the Facebook page (time depends on the “data scope” selected above, and the activity at the given Facebook page. When its done download the zip archive to you computer.

In this workshop we will be focusing on the file with the extension .gdf. You can view the other .tab files in any spreadsheet software such as Excel, Numbers or Google Spreadsheet.